ETL - ELT
ETL - Extract, Transform, Load
ETL is a process that extracts data from a source, transforms it into a format that is suitable for analysis, and then loads it into a storage.
ELT - Extract, Load, Transform
ELT is a process that extracts data from a source, loads it into a storage, and then transforms it into a format that is suitable for analysis.
Processing Steps
Extract
The first step in the ETL process is to extract data from a source. This source could be a database, a file, or an API. The data is extracted in its raw form, and is not yet ready for analysis.
Here is different ways to extract data:
-
Progressive Extraction: Extracting data incrementally, for example, only extracting data that has been added or updated since the last extraction.
-
Full Extraction: Extracting all data from the source, regardless of whether it has been added or updated since the last extraction.
-
Notification Updates: Using notifications to trigger an extraction when data has been added or updated.
Transform
The next step in the ETL process is to transform the data into a format that is suitable for analysis, data science, or machine learning. This could involve cleaning the data, removing duplicates, and converting data types.
Here are some common transformations:
- Cleaning: Removing or correcting errors in the data.
- Deduplication: Removing duplicate records from the data.
- Normalization: Converting data into a standard format.
- Aggregation: Combining data from multiple sources into a single dataset.
- Derivation: Creating new columns or features from existing data.
- Filtering: Removing unwanted data from the dataset.
- Joining: Combining data from multiple tables or datasets.
- ...
Load
The final step in the ETL process is to load the transformed data into a storage. This storage could be a database, a data warehouse, or a data lake. The data is now ready for analysis, data science, or machine learning.
Here are some common ways to load data:
- Full Load: Loading all the transformed data into the storage.
- Incremental Load: Loading only the data that has been added or updated since the last load.
- Icremental Load Streaming: Loading data in real-time as it is extracted and transformed.
ETL vs ELT
ETL and ELT are two different approaches to data processing. The main difference between the two is the order of steps in the process.
Common Points
- Objective: ETL and ELT have the same objective, who is to enable companies to exploit their data.
- Automation: Both ETL and ELT can be automated using tools and platforms.
- Data Governance: Both ETL and ELT require data governance to ensure data quality and compliance.
Impacts of Differences
Aspect | ETL | ELT |
---|---|---|
Disponibility | Need to know in advance what you plan to do with your data | Store the data and decide later how to use it |
Flexibility | Less flexible because data is transformed before being loaded | More flexible as data is loaded, u can choose how to use it later |
Accessibility | Data is transformed before loading, which can make access harder | Data is stored in raw form, making it more accessible in some cases |
Evolution | Less evolutive as transformations are predefined | More evolutive as you can store data and decide later how to use it |
Storage | Requires less storage as only transformed data is stored | Requires more storage as raw data is stored |
Which to Choose ?
- Syncronization: If you need to synchronize data between different systems, ETL is the best choice.
- Modernization: If you want to modernize your data architecture, ELT is the best choice for the migration.
- Data Volume: If you have a large volume of data to treat, ELT is the best choice because you can store the data and decide when to transform it.
- Vitesse Access: If you need to access the data quickly, ELT can be useful because this may involve less unnecessary delay in accessing data.